A mock data set consisting of 100 data points, say \(\mathbf{P}_1, \cdots, \mathbf{P}_{100}\), is generated below. They depict a perturbed and slanted ellipse in \(\mathbb{R}^3\). Let \(X_{raw}\) be the \(100 \times 3\) matrix whose rows consist of \(\mathbf{P}_1, \cdots, \mathbf{P}_{100}\).

set.seed(2018)
n <- 100
## Create a perturbed and slanted ellipse in R^3
x_center <- 3; y_center <- 1; z_center <- 2 
x_stretch <- 4; y_stretch <- 3; z_stretch <- 1
angle <- pi/3

x_ori <- x_stretch*cos(2*pi*(1:n)/n) + rnorm(n, mean = 0, sd = 0.5) # stretch
y_ori <- y_stretch*sin(2*pi*(1:n)/n) + rnorm(n, mean = 0, sd = 0.5) # stretch

x <- cos(angle)*x_ori - sin(angle)*y_ori + x_center # rotate
y <- sin(angle)*x_ori + cos(angle)*y_ori + y_center # rotate
z <- z_stretch*sin(2*pi*(1:n)/n) + z_center # make it slanted

Xraw <- data.frame(x, y, z) 
Xraw <- Xraw[sample(nrow(Xraw)), ] # permute rows to randomize P1, P2, P3
dim(Xraw)
## [1] 100   3

The following is an interactive 3D plot showing the distribution of points:

library(plotly)
colored <- c("P1", "P2", "P3", rep("P4 - P100", n-3))
plot_ly(Xraw, x = ~x, y = ~y, z = ~z, 
        color = colored,
        colors = c("red", "blue", "green", "grey")) %>%
  add_markers() 

Our goal is to find a good (meaning to be clarified later) two dimensional representation \(\mathbf{Q}_1, \cdots, \mathbf{Q}_{100}\) of \(\mathbf{P}_1, \cdots, \mathbf{P}_{100}\) so that we can reduce the dimension (from 3 to 2) of data and visualize them. If \(x\), \(y\), and \(z\) variables need to be equally weighted, then the data points should be centered and normalized. Let \(X\) be the centered and normalized version of \(X_{raw}\).

X <- scale(Xraw) %>%
  as.data.frame()
plot_ly(X, x = ~x, y = ~y, z = ~z, 
        color = colored, 
        colors = c("red", "blue", "green", "grey")) %>%
  add_markers() 

Consider a projection \(P\) onto a two dimensional subspace \(W\) of \(\mathbb{R}^3\) and let \(Y=XP\). Since \(X\) is centered, so is \(Y\) and hence the total variance of \(Y\) is given by \(\frac{1}{n-1}\|Y\|_{HS}^2\), where \(\|\cdot\|_{HS}\) denotes the Hilbert-Schmidt norm. By a good two dimensional representation, we mean finding a projection \(P\) such that \(\frac{1}{n-1}\|Y\|_{HS}^2\) is maximized, where \(Y=XP\). As \(P\) is a projection matrix, \(P=P^t=P^2\) and it follows that \[\|Y\|_{HS}^2=tr((XP)^t(XP))=tr(P^tX^tXP)=tr(X^tXPP^t)=tr(X^tY)\] and that

\[\begin{aligned} \|X-Y\|_{HS}^2&=tr((X-Y)^t(X-Y))\\ &=tr(X^tX)-tr(X^tY)-tr(Y^tX)+tr(Y^tY)\\ &=tr(X^tX)-tr(X^tY)-tr((Y^tX)^t)+tr(Y^tY)\\ &=tr(X^tX)-2tr(X^tY)+tr(Y^tY)\\ &=\|X\|_{HS}^2-\|Y\|_{HS}^2, \end{aligned}\]

which implies that \(\|Y\|_{HS}^2\) is maximized when \(\|X-Y\|_{HS}^2\) is minimized.

Let \(X=U\Sigma V^t\) be a singular value decomposition, where \(U\) is a \(100 \times 100\) orthogonal matrix, \(V\) is a \(3 \times 3\) orthogonal matrix, and \[\Sigma=\left[\begin{array}{ccc} \sigma_1 & & \\ & \sigma_2 & \\ & & \sigma_3 \\ & & \\ & & \end{array}\right]_{100\times 3}\] with \[\sigma_1\geq \sigma_2\geq \sigma_3.\] On one hand, by Eckart–Young–Mirsky theorem, \[\min_{rank(Y)=2} \|X-Y\|_{HS}^2 = \sigma_3^2\] On the other hand, let \(Y_0=XP_0\), where \(P_0=VQV^t\) and \[Q=\left[\begin{array}{ccc} 1 & & \\ & 1 & \\ & & 0 \end{array}\right],\] then \(P_0\) is a projection of rank 2 satisfying \(\|X-Y_0\|_{HS}^2 = \sigma_3^2\). This is our desired projection!

Note that \(XP_0=U\Sigma V^tVQV^t = U\Sigma_0V^t\), where \[\Sigma_0=\left[\begin{array}{ccc} \sigma_1 & & \\ & \sigma_2 & \\ & & 0 \\ & & \\ & & \end{array}\right]_{100 \times 3}.\]

In particular, this shows that the first two columns of \(V\) are basis for \(W\), the subspace associated with \(P\), and the first two columns of \(U\Sigma_0\) provide the desired coordinates \(\mathbf{Q}_1, \cdots, \mathbf{Q}_{100}\). This can be obtained as in the following using an R function svd():

SVD <- svd(X, nu = n)
U <- SVD$u # 100 x 100
Sigma <- rbind(diag(SVD$d), matrix(0, nrow = n-3, ncol = 3)) # 100 x 3
V <- SVD$v
X_recover <- (U %*% Sigma) %*% t(V) # This recovers X 
Q <- U %*% Sigma # The first two columns of Q gives coordinates Q1, Q2, ..., Q100
head(Q[, 1:2])
##            [,1]         [,2]
## [1,] -1.6186598  0.555885499
## [2,] -1.6230094  0.065770248
## [3,]  0.3854602 -1.468867955
## [4,] -1.8899114  0.572046751
## [5,] -1.8833910 -1.172104935
## [6,]  1.7828531  0.003271112

Remark: \(U\Sigma\) (and hence \(U\Sigma_0\)) can be computed from the orthogonal diagonalization of \(XX^t\), as
\[XX^t = U\Sigma V^t(V\Sigma^tU^t) = U\Lambda U^t,\] where \[\Lambda = \Sigma\Sigma^t= \left[\begin{array}{ccccc} \sigma_1^2 & & & &\\ & \sigma_2^2 & & & \\ & & \sigma_3^2 & & \\ & & & 0 & \\ & & & & \ddots \end{array}\right]_{100 \times 100}.\]

The R function prcomp() easily computes the coordinates \(\mathbf{Q}_1, \cdots, \mathbf{Q}_{100}\) from \(X_{raw}\):

Y <- prcomp(Xraw, center = T, scale = T)$x %>%
  as.data.frame() # PC1 and PC2 of Y represent coordinates of Q1, Q2, ..., Q100 
head(Y)
##           PC1          PC2        PC3
## 74 -1.6186598  0.555885499 -0.1671973
## 81 -1.6230094  0.065770248 -0.1777017
## 6   0.3854602 -1.468867955 -0.1219480
## 77 -1.8899114  0.572046751  0.1524381
## 87 -1.8833910 -1.172104935  0.1342773
## 27  1.7828531  0.003271112  0.1572334

The projected points on the PC1-PC2 plane viewed from PC3 axis:

scene <- list(camera = list(up = list(x = 1, y = 0, z = 0),
                           eye = list(x = 0, y = 0, z = 2)))
plot_ly(Y, x = ~PC1, y = ~PC2, z = ~PC3, 
        color = colored,
        colors = c("red", "blue", "green", "grey")) %>%
  add_markers() %>%
  layout(scene = scene, autosize = T)

Finally, one can get the following 2D plot as a result of dimension reduction:

Y %>% mutate(color_code = colored) %>%
  ggplot(aes(PC1, PC2, color = color_code)) +
  scale_color_manual(values = c("red", "blue", "green", "grey")) +
  theme(aspect.ratio = 1) +
  geom_point()